Introduzione alla Lezione 3: Affrontare la Classificazione Non Lineare

Stiamo superando i limiti dei modelli lineari, che faticano a classificare dati non separabili da una linea retta. Oggi applichiamo il flusso di lavoro di PyTorch per creare un Reti Neuronal Deep (DNN) in grado di apprendere frontiere di decisione complesse e non lineari essenziali per compiti di classificazione del mondo reale.

1. Visualizzare la Necessità di Dati Non Lineari

Il nostro primo passo consiste nel creare un dataset sintetico impegnativo, come la distribuzione a due mezze lune, per dimostrare visivamente perché i modelli lineari semplici falliscono. Questa configurazione ci obbliga ad utilizzare architetture profonde per approssimare la curva intricata necessaria a separare le classi.

Proprietà dei Dati

Struttura dei Dati: Caratteristiche dei dati sintetici (ad esempio, $1000 \times 2$ per $1000$ campioni con 2 caratteristiche).
Tipo di Output: Un singolo valore di probabilità, spesso torch.float32, che rappresenta l'appartenenza alla classe.
Obiettivo: Creare una frontiera di decisione curvilinea attraverso un calcolo stratificato.

Il Potere delle Attivazioni Non Lineari

Il principio fondamentale delle DNN è l'introduzione della non linearità nei livelli nascosti tramite funzioni come ReLU. Senza queste, impilare livelli porterebbe semplicemente a un grande modello lineare, indipendentemente dalla profondità.

TERMINALbash — classification-env

> Ready. Click "Run" to execute.

TENSOR INSPECTOR Live

Run code to inspect active tensors

Question 1

What is the primary purpose of the ReLU activation function in a hidden layer?

Introduce non-linearity so deep architectures can model curves

Speed up matrix multiplication

Ensure the output remains between 0 and 1

Normalize the layer output to a mean of zero

Question 2

Which activation function is required in the output layer for a binary classification task?

Sigmoid

Softmax

ReLU

Question 3

Which loss function corresponds directly to a binary classification problem using a Sigmoid output?

Binary Cross Entropy Loss (BCE)

Mean Squared Error (MSE)

Cross Entropy Loss

Challenge: Designing the Core Architecture

Integrating architectural components for non-linear learning.

You must build a nn.Module for the two-moons task. Input features: 2. Output classes: 1 (probability).

Step 1

Describe the flow of computation for a single hidden layer in this DNN.

Solution:
Input $\to$ Linear Layer (Weight Matrix) $\to$ ReLU Activation $\to$ Output to Next Layer.

Step 2

What must the final layer size be if the input shape is $(N, 2)$ and we use BCE loss?

Solution:
The output layer must have size $(N, 1)$ to produce a single probability score per sample, matching the label shape.